tl;dr

This work expands upon the Beta to Release matching proof-of-concept, by validating the technique across versions. The matching model was trained on v67 Desktop Firefox versions, matching Beta profiles that were representative of Release. Comparisons were made against these matched profiles and Release for v68 performance and engagement metrics. The results for v68 are similar to those observed in v67. This suggests this methodology can be used to calculate additional Firefox Release Health metrics derived from the Beta populations.

The following tables represent the relative difference between the Beta and Release train (v67) and validation (v68) datasets for the mean and median respectively.

Beta-Release Difference: Mean

The following shows the
CONTENT_PAINT_TIME_CONTENT TIME_TO_DOM_CONTENT_LOADED_END_MS MEMORY_TOTAL TIME_TO_DOM_COMPLETE_MS FX_PAGE_LOAD_MS_2_PARENT FX_TAB_SWITCH_TOTAL_E10S_MS CONTENT_FRAME_TIME_GPU COMPOSITE_TIME_GPU TIME_TO_LOAD_EVENT_END_MS startup_ms
pre-matching: v67 0.1281559 0.2356492 0.7393386 0.3833827 0.1917986 0.3048868 0.0979446 0.2107906 0.4134848 0.3682498
post-matching: v67 0.0054015 0.0004005 0.6380770 0.0532963 0.0187386 0.0441628 0.0229548 0.0994379 0.0559891 0.2795177
pre-matching: v68 0.1478562 0.3323291 0.6108449 0.4846121 0.2650356 0.3245591 0.0868224 0.0128613 0.5148570 3.5766275
post-matching: v68 0.0214504 0.0393589 0.6208545 0.0709876 0.0227569 0.0232778 0.0126205 0.0753851 0.0748421 3.8479595

Beta-Release Difference: Median

CONTENT_PAINT_TIME_CONTENT TIME_TO_DOM_CONTENT_LOADED_END_MS MEMORY_TOTAL TIME_TO_DOM_COMPLETE_MS FX_PAGE_LOAD_MS_2_PARENT FX_TAB_SWITCH_TOTAL_E10S_MS CONTENT_FRAME_TIME_GPU COMPOSITE_TIME_GPU TIME_TO_LOAD_EVENT_END_MS startup_ms
pre-matching: v67 0.0702651 0.1576861 0.5028295 0.2033473 0.1579308 0.1034576 0.0400662 0.1236804 0.2201331 0.3451365
post-matching: v67 0.0083898 0.0332697 0.4675830 0.0213340 0.0126864 0.0722753 0.0112279 0.0850397 0.0188157 0.1348645
pre-matching: v68 0.0777329 0.2705283 0.4233655 0.3147478 0.2374249 0.1487312 0.0317406 0.0303801 0.3314132 0.4706603
post-matching: v68 0.0212776 0.0172085 0.4738329 0.0275340 0.0072225 0.0127165 0.0019684 0.0670938 0.0338250 0.2571110

Problem Statement

There is significant utility in findiing representative Beta populations of Firefox that can give insight into Release before its launch. In a previous work, it was shown that statistical matching can find a subset of Beta that is representative of Release regarding performance metrics. However, a real world use-case is training the model on v67, finding matched clients, then applying to a subsequent version. This work attempts to further validate the technique, by following this real-world use-case.

  1. Train a statistical matching model on v67 Firefox data that matches Beta to Release profiles.
  2. Extract the matched v67 Beta profiles.
  3. Subset v68 Beta profiles by the matched v67 profiles.
  4. Measure the difference in performance and engagement between between Beta and Release before and after matching.

Methodology

Data Preparation

The code that exported the data is available here. Similar filters are applied as the previous work.

  • Desktop Firefox
  • Two weeks of collection per profile, starting with first observed ping within date window
  • en-US, en-GB locales
  • US, GB countries

Training

The follows makes up the training dataset, used in statistical matching:

  • Version 67
  • Beta dates: last four weeks (06/04/2019 - 07/09/2019)
  • Release dates: first four weeks (07/09/2019 - 08/06/2019)
  • Composition:
channel count
beta 80052
release 19948

Validation

The followings filters constitute the validation dataset:

  • Version 68
  • Beta dates: last four weeks (04/23/2019 - 05/21/2019)
  • Release dates: first four weeks (05/21/2019 - 06/18/2019)
  • Composition: before matching and subsetting of Beta)
channel count
beta 91502
release 391806

Model

The highest performant model was trained on the v67 dataset. The code the performed the modeling is available here:

  • Model: Nearest Neighbors with Mahalanobis distance measure
  • Beta oversampling: 4x Beta to Release
  • Covariates (Model Features): search_count, daily_max_tabs, daily_num_sessions_started, num_addons, num_bookmarks, profile_age, timezone_offset, cpu_speed_mhz, memory_mb and cpu_cores

The final result of this model is a subset of Beta profiles most representative of Release.

Validation

The next step is to subset the validation v68 dataset by these matched Beta profiles. This reduces the Beta sample size used in the subsequent analysis:

  • v67 Beta subset: 19948 distinct profiles
  • v68 Beta subset: 13452 distinct profiles

The following plots show the covariate distributions for the following subsets:

  • Beta v68: pre-matching
  • Beta v68: matched and subsetted
  • Release v68

NOTE: Guiding lines have been added for the following:

  • red dashed: Release mean
  • blue dashed: Release median
  • green dashed line: subsetted Beta mean.

Holdout Covariates

The same set of performance metrics as the previous analysis, were held out from matching model training and used as a model diagnostic:

  • CONTENT_PAINT_TIME_CONTENT
  • TIME_TO_DOM_CONTENT_LOADED_END_MS
  • MEMORY_TOTAL
  • TIME_TO_DOM_COMPLETE_MS
  • FX_PAGE_LOAD_MS_2_PARENT
  • FX_TAB_SWITCH_TOTAL_E10S_MS
  • CONTENT_FRAME_TIME_GPU
  • COMPOSITE_TIME_GPU
  • TIME_TO_LOAD_EVENT_END_MS
  • startup_ms
  • content_crashes

Training Dataset: v67

Validation Dataset: v68

Training Covariates

The following covariates were used in training the v67 model. Note that the environment covariates were trained on the numerical versions, but have been converted to categories for plotting.

Beta-Release Difference: Mean

active_hours daily_max_tabs daily_tabs_opened search_count daily_unique_domains daily_num_sessions_started num_bookmarks num_addons num_active_days num_pages uri_count session_length profile_age
pre-matching: v67 0.3266487 0.3935228 0.2595199 0.2790857 0.0430361 0.0515823 0.4286822 0.2277464 0.3533521 0.0012293 0.3271315 0.2661923 0.0263616
post-matching: v67 0.2682378 0.0292538 0.0389816 0.0874923 0.0245319 0.0341498 0.1172689 0.0675241 0.3027796 0.1151344 0.2571066 0.2795123 0.0021744
pre-matching: v68 0.1585895 0.3423212 0.2416875 0.1076707 0.0004563 0.0461421 0.4279918 0.2137228 0.1752781 0.0587843 0.1581423 0.0956491 0.0421692
post-matching: v68 0.0879519 0.0230131 0.0185330 0.1117880 0.0088909 0.0784622 0.1348414 0.0622400 0.0643182 0.0895890 0.0658028 0.0036550 0.0193793

Beta-Release Difference: Median

active_hours daily_max_tabs daily_tabs_opened search_count daily_unique_domains daily_num_sessions_started num_bookmarks num_addons num_active_days num_pages uri_count session_length profile_age
pre-matching: v67 0.4538899 0.1428571 0.1000000 0.4 0.0000000 0.0476190 0.0643939 0.2 0.3333333 0.3316079 0.4914611 0.4298208 0.0015267
post-matching: v67 0.3589987 0.0000000 0.0000000 0.2 0.0000000 0.0285714 0.0931818 0.0 0.3333333 0.4344796 0.3966165 0.3671831 0.0120301
pre-matching: v68 0.3630908 0.1142857 0.0208333 0.4 0.0222222 0.0476190 0.1450000 0.2 0.3000000 0.4338290 0.4165103 0.3363515 0.0236220
post-matching: v68 0.1583799 0.0069444 0.0700809 0.0 0.0155440 0.0699301 0.0545455 0.0 0.0909091 0.3409352 0.1929825 0.0746248 0.0274657

Training Dataset Continuous Covariates: v67

Training Dataset Categorical Covariates: v67

Validation Dataset Continuous Covariates: v68

Validation Dataset Categorical Covariates: v68

Discussion

The matching yielded a subset that was similarly representative to v67 as to v68 for most of the covariates reviewed. However, for a subset of covariates, the difference between channels actually increased (e.g., profile_age, default_search_engine), or are distinctly different than Release before and after matching, namely MEMORY_TOTAL. This latter covariate requires further investigation as why its distribution is significantly more spread to higher values than for Release.

The usage of a performance metrics hold-out set is not necessary, when applying the model across versions. However, research has shown that optimal feature selection for statistical matching uses the effects (e.g., hold-out covariates) rather than the response (e.g., whether it is Beta or Release) typical of predictive modeling. Therefore, knowledge of the metrics of concern before matching occurs is key.

Next Steps

This methodology is an initial step towards providing an additional set of Firefox Release Heatlth metrics derived from the Beta release population. Additional work to realize this goal include:

  • Address the full country and locale distributions
  • Investigate other model architectures that utilize nominal/categorical covariates
  • Additional feature/covariate generation
    • locale
    • additional environmental (e.g., HDD/SDD)
    • additional performance metrics
      • Especially those influencing WebRender and Fission
    • crash
  • Research into covariate selection for matching models